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ABSTRACT 

N-body simulations are widely used to simulate the dynamical evolution of a variety 
of systems, among them star clusters. Much of our understanding of their evolution 
rests on the results of such direct N-body simulations. They provide insight in the 
structural evolution of star clusters, as well as into the occurrence of stellar exotica. 
Although the major pure N-body codes STARLAB/KIRA and NBODY4 are widely 
used for a range of applications, there is no thorough comparison study yet. 

Here we thoroughly compare basic quantities as derived from simulations per- 
formed either with STARLAB/KIRA or NBODY4. 

We construct a large number of star cluster models for various stellar mass function 
settings (but without stellar/binary evolution, primordial binaries, external tidal fields 
etc), evolve them in parallel with STARLAB/KIRA and NBODY4, analyse them in a 
consistent way and compare the averaged results quantitatively. For this quantitative 
comparison we develop a bootstrap algorithm for functional dependencies. 

We find an overall excellent agreement between the codes, both for the clusters' 
structural and energy parameters as well as for the properties of the dynamically 
created binaries. However, we identify small differences, like in the energy conservation 
before core collapse and the energies of escaping stars, which deserve further studies. 

Our results reassure the comparability and the possibility to combine results from 
these two major N-body codes, at least for the purely dynamical models (i.e. without 
stellar/binary evolution) we performed. Further detailed comparison studies for more 
complex systems, e.g. including stellar/binary evolution, are required. 

Key words: Methods: N-body simulations, Methods: statistical, open clusters and 
associations: general 



1 INTRODUCTION 



In recent years, stellar dynamics has led to an advance in a 
variety of fields, su ch as studies of individ ual star clusters 
llHurlev et al l l2005h . star cluster systems (|Vesperini et al.l 
120031 ). populations of "exotic" objects in star clusters 
(e.g. runaway merger produ cts with masses of up to 



Portegies Zwart et alj |2004 
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few thousands solar masses, 
blue stragglers, iHurlev et al.l 

120071 ). the formation and evolution of higher-order hi- 
erarchical systems (tripl es, quadruples and higher, e.g 
Ivan den Berk et al.l l2007h . the Galactic centre and run- 



away stars (jGi ialaiidris ct al. 20Q5|; iBaumgardt et al 1 120061 : 
lLockmann fc Baum gardt 200^), etc. 
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The major codes used in this field are the family of 
nbodyx codes l|Aar scth 1999; the most widely used ver- 
sions are nbody4, nbody6, nbody6++, the most recent 
version being NBODY7) and the STARLAB environment 
with its N-body integrator kira (|Portegies Zwart et al.l 
2001). Despite these codes being widely used, there is 
no thorough comparison study yet. First attempts have 
been initiated by Douglas Heggie and others at the IAU 
General Assembly 1997 in Kyoto (therefore, the "Ky- 
oto experiment"), however until today , the numbe r of 
results and their analysis is small (see iHeggid l200ll and 
http:/ /www. maths. ed.ac.uk/~heggie/kyotoII/kyotoII. html 
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for descriptions of this collaborative experiment and some 
of its results) . 

Although the fundamental integration scheme (4 th or- 
der ; block-timestep "Hermi te" predictor-corrector scheme, 



see Makino fc Aarsethll 19921 , the next section in this paper, 
or lAarsethll200l for a variety of technical details) is the same 
for both codes, severe differences in the treatment of bina- 
ries, stellar and binary evolution are present, plus naturally 
different implementations of otherwise comparable compo- 
nents. 

With this paper we start a series of publications to 
study the impact of differences in the input physics onto 
the results from both codes. We start with the most simple 
models, not including stellar evolution, external tidal fields 
or primordial binaries. More complex models, without the 
aforementioned restrictions, will be studied in upcoming pa- 
pers of this series. 

In these studies we concentrate on the statistical treat- 
ment of a large number of runs, represented by its median 
values and uncertainty ranges, as the results of single runs 
will natur ally diverge due to th e amplification of numeri- 
cal errors (jGoodman et al.ll 19931 ). This holds for two models 
with slightly different initial configurations and evolved with 
the same code, as well as the same initial model evolved with 
two different codes. We will not compare wall-clock times 
for the different runs (as this is dependent on a multitude of 
parameters, software and hardware settings), or parameters 
usually only relevant to code developers. Instead we want 
to provide the interested scientific user a guideline to the 
comparability of the codes studied and point at differences 
concerning parameters likely relevant for the user. 

The paper is organised as follows: Sect. [2] gives an 
overview of similarities and differences of the STARLAB and 
NBODY4 input physics. Sect. [3] describes the general model 
setup and the data analysis pipeline. In Sect.[4]we introduce 
a bootstrap approach to quantify differences between func- 
tions. In Sect. [5]we present our results for a range of mass 
function settings. Energy conservation is studied in Sect.[6l 
core collapse in Sect. [7] and the properties of stars becoming 
unbound during cluster evolution in Sect. [8] We finish this 
paper with our conclusions in Sect. [9] 



2 OVERVIEW 

2.1 Similarities in input physics: N-body 
integrator scheme 

Almost all recent direct N-body integrators (including 
NBODY4 and kira/starlab) are based on the 4 th order, 
block-ti mestep "Hermite" predict or-corrector scheme (but 
see e.g. iNitadori fe Makinol 120081 for higher-order N-body 
integrators) . 

"Hermite" predictor-correc tor scheme: Th is integration 
scheme was first described by iMakinol (|l99ll ) . It is based 
on individual timesteps for every star, (approximate) pre- 
diction of all stars' positions and derivatives, and (accurate) 
calculation and correction (using Hermite interpolation) for 
a subset of stars at any given timestep. 

More specifically, the scheme comprises of the following 
steps to evolve one star i from the present time to to the 
time ti = to + At. The positions, velocities, acceleration 



and the first time derivative of the acceleration at time to are 
assumed to be known for all stars. This description follows 
iMakino fc Aarsethl (|l992T ). 

(i) Predict/extrapolate the positions, velocities, acceler- 
ations and the first time derivatives of the acceleration at 
time ti, using Taylor expansion (up to 4 th order for the po- 
sitions) with the quantities at time to, for all stars. 

(ii) Calculate for star i the acceleration and the first time 
derivative of the acceleration at time ti, based on the pre- 
dicted positions and velocities of all stars. 

(iii) Calculate for star i the second and third time deriva- 
tives of the acceleration at time to, using the acceleration 
and the first time derivative of the acceleration at times to 
and ti. 

(iv) Calculate for star i the correction to the predicted 
position and velocity, based on the second and third time 
derivatives of the acceleration at time to. 

Block timesteps: In principle, the optimum timestep can 
be estimated for each star individually, based on this star's 
acceleration and its time derivatives. In reality, it is compu- 
tationally favourable to group stars with approximately the 
same timestep together, and evolve whole "blocks" of stars 
at once. This treatment reduces the overheads otherwise 
needed to calculate the predicted positions and velocities 
of all stars. Conventionally, block timesteps of power-of-2 
(At n oc 1/2") are used. 



2.2 Differences in input physics: Treatment of 
binaries 

The treatment of binaries is one of the challenges in N-body 
simulations. While especially close binaries are dynamically 
important (e.g. star-binary and binary-binary interactions 
can eject stars from the cluster core, resulting in the halting 
of core collapse and leading to core re-expansion), their rel- 
evant timescale is the orbital period (of the order of days), 
while the relevant timescale for the cluster as a whole is the 
crossing timescale (of the order of Myrs). A "brute force" 
approach would need to set the timestep to a fraction of 
the orbital period to evolve the binary accurately, which 
would immediately stall the calculation (and corrupt energy 
conservation due to exponential growth of numerical inac- 
curacies) . 

However, especially close/hard binaries are hardly per- 
turbed by external effects, as their binding energy is high 
compared to the energy injected by external perturbations. 
Such binaries evolve essenti ally as in isolation. 

NBODY4 uses the KS ([Kustaanheimo fc Stiefell I1965T I 
and CHAIN (|Mikkola fc Aarsethlll993l ') regularisation tech- 
niques to follow close encounters between stars. The basic 
idea of these regularisation methods is to switch to special 
coordinate systems together with appropriate time trans- 
formations which significantly improve the overall energy 
conservation during close encounters. 

STARLAB separates between "unperturbed/hard bina- 
ries" and "perturbed binaries", where the distinction is 
made where the dimensionless perturbation (i.e. the ratio of 
the external perturbation to the internal binary binding en- 
ergy) reaches a critical value (typically 10 -6 ). Unperturbed 
binaries are evolved solving analytically the Kepler equa- 
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tions, and their components are treated as point masses, for 
the purpose of influencing other stars. 

"Slightly perturbed" bound pairs with a dimension- 
less perturbation between 10 -6 and 10 -5 are treated with 
a "slowdown" algo r ithm similar to the one described in 
iMikkola fc Aarsethl | | 19961 ). 

More strongly perturbed pairs and multiples are treated 
as resolved into their components for the purpose of deter- 
mining their influence of surrounding stars. Their motion is 
calculated directly. 

Perturbations are followed efficiently by keeping a 
perturber-list for each binary. 

More detailed inf ormation are given in 
IPortegies Zwart et alj (1200 lh 



3 MODEL SETUP AND DATA ANALYSIS 

In order to simplify future comparisons with other N- 
body codes, we provide the input snapshots (both in 
STARLAb/kira as well as in nbody format), the time 
evolutions of important cluster parameters, and for 
each code an example run parameter file on 
our webpage (http://www.phys.uu.nl/~anders/data/ 
NBODY _STARLAB_Comparison/ and 
http:/ /members. galev.org/nbody/ 
NBODY J3TARLAB_Comparison/). 

We created 50 input models per setting, using the 
appropriate STARLAB tasks, to improve the statistics. For 
NBODY4, these models were converted into the appropriate 
input format, to have maximum comparability. 

All input models were evolved for 1000 N- 
body time units, well beyond core collapse, using 
NBODY4 (the May 2008 version from Aarseth's web- 
page3) respectively STARLAb/kira (throughout the 
paper we use STARLAB version 4.4.2). All simulations 
were performed on PCs h osting GRAPE spe cial- 
purpose hardware (see e.g. iMakino et al .11 20031 ; the 
nbody 4 runs were performed on a machine hosting a 
GRAPE6A board, the STARLAB on a machine hosting 
a GRAPE6BLX board). As NBODY6 does not support 
the usage of GRAPE hardware, we limit our anal- 
ysis to NBODY4 in the course of this paper. For the 
impact of the hardware (and the associated internal 
calculation accu racy) on the r esults of our N-body 
simulations, see lAnders! (|2008T ). Preliminary results 
indicate little impact. 

Crucial for our comparison is a self-consistent analy- 
sis of the nbody4 and starlab/kira output. In order to 
achieve this goal we convert the STARLAB output into the 
NBODY4 output format (with the same number of signifi- 
cant digits = 15), removing all information not available 
for the nbody 4 output, like local densities, binary param- 
eters, energies, cluster centres etc. At this stage the binary 
tree structure is not yet established and the binary /multiple 
components are treated as single stars. 

The results, snapshot by snapshot, were then fed into 
starlab/kira and evolved for a short time (a l/32 th of an 
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N-body time unit), in order to reconstruct the binary pop- 
ulations. From the resulting snapshots we calculate a large 
number of diagnostic^: structural parameters (cluster cen- 
tres, mass profile, Lagrange radii, King parameter, core den- 
sity etc), energies (potential energy, kinetic energy, energy 
error etc), and parameters of dynamically created binaries 
(eccentricities, binding energies, positions of the binaries in- 
side the cluster etc). 

For the majority of this paper we will concentrate on 
the results from snapshots with fully reconstructed binary 
tree structure. The impact of the binary tree reconstruction 
on the data will be discussed in Sect. 15.41 



3.1 Nomenclature 

We will use the following definitions and abbreviations 
throughout the paper. 

• standard runs = std: simulations made with the stan- 
dard settings described in Sect. 13.21 

• MF10 runs: simulations made with the s t andar d set- 
tings described below, except that a ISalpeterl (|l955h mass 
function is used, with the upper mass limit being lOx larger 
than the lower mass limit 

• MF100 runs: simulations made with the standa rd set- 
tings described below, except that a ISalpeterl ( |l955j ) mass 
function is used, with the upper mass limit being lOOx larger 
than the lower mass limit 

• unperturbed binaries: relative external perturbation is 
smaller than 10 -6 

• perturbed binaries: relative external perturbation is 
larger than 10~ 6 , but binding energy |Ebmd| ^ 0.5 kT. 

• multiples: second strongest bound orbit in a multiple 
system (primary mass = total mass of inner binary with 
strongest bound orbit) 

• significance level of statistical test results: 

- highly significant: p-value < 1% 

- significant: p-value < 5% 

- weakly significant: p-value < 10% 

In the remainder of this work, for studying structural 
parameters and energies we will use only the median cluster 
parameters calculated from the individual runs. The associ- 
ated uncertainty ranges are the 16%/84% quantiles (similar 
to the la ranges for Gaussian-distributed quantities around 
their mean value), divided by the square-root of the num- 
ber of runs contributing, to estimate the uncertainty in the 
position of the median value. For studying the properties of 
dynamically created binaries, we add up all binaries from 
the individual runs which are present at a given time. 

This procedure reduces the noise from the individual 
runs. In addition, as runs inevitably diverge (either two start 
models with slightly different initial conditions evolved with 
the same code, or one start model evolved with two dif- 
ferent codes), a direct comparison of individual runs is not 
expected to give meaningful results. 



2 We used the analysis task HSYS.STATS in STARLAB, and recal- 
culate energies, core radii and other quantities requiring 0(N 2 ) 
operations where necessary. 
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3.2 Benchmark tests: The "standard runs" 

The benchmark test settings (further on referred to as "stan- 
dard runs") we propose are the following: 

(i) 1024 (=lk) particles 

(ii) equal mass system 

(iii) no primordial binaries 

(iv) no stellar /binar y star evolution 

(v) iPlummerl (|l91ll ) sphere density profile 

(vi) no external tidal field 

These settings constitute the most simple configuration, 
which is likely available for testing in every future N-body 
code. 

In further sections/papers, several of the restrictions im- 
posed to establish the benchmark test settings are going to 
be dropped, in order to get more realistic cluster models. 



4 DIFFERENCE OF FUNCTIONS & 
BOOTSTRAP TEST 

In our study we will obtain time evolutions of various pa- 
rameters, as computed for a variety of settings. We want to 
compare these time evolutions quantitatively and determine 
the statistical significance of differences. For the former one 
we define a measure how different two functional relations 
are, for the latter one we introduce a version of bootstrap- 
ping, adapted to such functional dependencies. 



4.1 Difference of functions: The distance measure 

Assume we have two functional dependencies of one pa- 
rameter from the independent variable x: yi(x) and J/2 (a;)- 
For each x these dependencies have uncertainties ai(x) 
and 02 (x). For example, from our studies, this relates to 
2/i (x) = rZ?° DY (time) and y 2 (x) = r s J£ RLAB (time) (i.e. 
the core radius at a given time, as determined from nbody 
respectively STARLAB simulations) and the related uncer- 
tainties. 

If the data have asymmetric error bars it J" (a;) and 
a^(x), e.g. originating from the use of quantiles, we suggest 
to use an average ai(x) = 0.5 ■ (a^(x) + a^(x)). However, 
other measures are possible and, as long as consistency is 
ensured, should give similar results. 

The relative difference between the functional depen- 
dencies at a given x is then: 



r 12 = A.£|M*)I 



(3) 



Sll(x) 



yi(x) - y 2 (x) 



(1) 



v / ai(x) 2 + a 2 (x) 2 
We then define the "difference between functions 1 and 



2" as 



A12 = TT 



Em*; 



(2) 



where N is the number of datapoints used for the statis- 
tic. We consider only the absolute value, as we want to have 
a measure of the size of the difference, but not necessarily 
its direction. In addition, this ensures A12 = A21. 

Equivalently we define the "absolute difference between 
functions 1 and 2" as 



While A12 is more sensitive to systematic offsets, Ti2 
traces also statistical fluctuations. 



4.2 Bootstrap test for comparing functions 

We calculate 3 x 300 test clusters with STARLAB using the 
same analysis routines as for the other clusters. These clus- 
ters follow the same settings as the main simulations (i.e. 
300 clusters for each respective mass function). 

From these test clusters we randomly select sets of 50 
clusters each (i.e. the number of clusters in the main simula- 
tions) with replacement, and calculate for each parameter 
the median y T (x) and quantiles <r T (x). 

We build 2000 such sets. Out of those we randomly 
select two sets (again with replacement) and derive the in- 
dividual values of Af 2 and r^. We repeat this procedure 
10000 times to estimate the Af 2 and r^ 2 test distributions 
for each parameter. As all test clusters are calculated with 
the same settings, the Af 2 and T12 test distributions repre- 
sent the null hypothesis "functions 1 and 2 are drawn from 
the same parent distribution" . By comparing these test dis- 
tributions with the values derived from the main simulations 
A 12 and Ff 2 we can quantify the fraction of data in the test 
distribution with A12 ° r ^12 more deviating than the values 
derived from the main simulations Af 2 or rf 2 . This value 
serves as measure of how similar the two main simulations 
are. 

In order to evaluate if the 300 test clusters were suffi- 
cient, we performed the same analysis with a subset of 250 
test clusters for the "MF10" setting. Depending on the pa- 
rameter studied, the resulting comparability p- values differ 
on average by ±1% up to maximum deviations of ±3%. None 
of these differences changed the significance level of any of 
the results, though. 

In order to avoid applying the test statistic to highly 
correlated data, which appears for the earliest timesteps (as 
the NBODY4 and STARLAb/kira runs share the same start 
models) and which is beyond the area of application of the 
test statistic, we start the summation in Eq. [2] and [3] at 10 N- 
body time units. This value is a compromise between avoid- 
ing early correlated data and containing the core collapse 
phase for all models. We tested our method with a range 
of starting times and find in general very good agreement. 
On average differences are ~3%, for few extreme cases up 
to ~15%, with a trend of increasing offsets with increasing 
starting times. This changes only occasionally the classifica- 
tion of the test result into the significancy level categories 
defined in Sect. 13.11 mainly in cases where the p-value al- 
ready is close to a boundary between such categories. 



5 RESULTS 

5.1 Results for the "standard runs" 

Fig. [1] (top left) shows that core-collapse occurs at around 
320 N-body time units. This coincides well with the often- 
used criterion of the first occurrence of a binary with binding 
energy higher than 100 kT (see Table [3}. At the same time, 
other structural parameters start to change as well (e.g. the 
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Figure 1. Comparison of "standard runs" simulations using STARLAB (green/grey) vs NBODY 4 (black). The lines show the median values, 
the error bars give the uncertainty ranges from the 50 individual runs. Shown are the time evolutions of the core radius (top left), 
half- mass radius (top right), potential energy (bottom left) and kinetic energy (bottom right). 
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Figure 2. Comparison of binary parameters from "standard runs" after 1000 N-body time units (=well after core collapse) using STARLAB 
(green/grey) vs NBODY 4 (black). Shown are the cumulative distributions of the semi-major axis (top left), the eccentricity (top right), the 
distance from the cluster centre (in units of the cluster's core radius; bottom left) and the binding energy (bottom right). The lines show 
the data, the error bars give the uncertainty ranges from bootstrapping. For the eccentricity, the prediction for a thermal distribution 
(e 2 ) is overplotted. 
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cluster's half- mass radius, shown in Fig.[T] top right), as the 
occurrence of hard binaries starts heating the cluster core, 
propagating outwards, resulting in an irreversible overall ex- 
pansion of these rather low-mass clusters. 

At the time of core collapse, also both kinetic and poten- 
tial energy of the cluster as a whole change their behaviour: 
the potential energy increases, while the kinetic energy de- 
creases slightly, as energy gets increasingly locked up in bina- 
ries (Fig. [1] lower panels) . This effect will be studied further 
in Sect. I5T41 where the effects of rebuilding the binary tree 
structure is discussed. 

The results obtained with starlab and with nbody4 
lie for most studied parameter within the combined uncer- 
tainty ranges. However, for some parameters small system- 
atic offsets seem to be present. To test their significance we 
developed a bootstrapping algorithm for comparing func- 
tions, described in Sect. 3] 

The results from this bootstrapping test are presented 
in Table [1] In this Table we give for various parameters the 
fraction of test runs made with the same settings (i.e. repre- 
senting the null hypothesis of a unique parent distribution) 
that are more deviating (i.e. which have larger A12 respec- 
tively Fi2) than the results for this parameter from the main 
simulations using NBODY4 or starlab. 

Except for the total energy E to t (and the conservation 
of the total energy <5E to t), which will be discussed below 
(Sect. [6|, solely the core radius evolutions (and quantities 
calculated from the core radius) are significantly discrepant. 
This discrepancy originates from a kink in the median tem- 
poral core radius evolution for the nbody4 simulations. This 
kink could not be traced back to any kink/jump in individ- 
ual runs (on the contrary, some starlab runs have stronger 
jumps than any of the NBODY4 runs). We rather expect this 
kink to be an unfortunate cumulative stochastic effect. This 
is supported by the fact that after the kink the temporal 
dependencies from starlab and NBODY4 continue to evolve 
in parallel, though offset by the amount the kink caused. 
The differences in the evolutions of the kinetic energies ap- 
pear to be larger than for the core radii, however, as also the 
uncertainties are larger these differences are not statistically 
significant. 

In Fig. [2] we study the distribution of the parameters 
describing the dynamically created binaries. For the lk stan- 
dard runs core collapse occurs at approximately 320 N-body 
time units (see Fig. [T] and Table [3}. We show the parameter 
distributions of binaries present at 1000 N-body time units, 
hence well after core collapse. The shown error bars are es- 
timated from 10000 bootstrap realisations of each dataset. 
Visually, the distributions compare well. 

In order to quantify differences between the parame- 
ter distributi ons using eith er starlab or nbody4 we used a 
Kuiper test (|Kuiperlll962l . i.e. a n advanced KS te st, for KS 
test see e.g. Numerical Recipes iPress et al.lll992t ). The re- 
sults are presented in Table [5] Given are the total numbers 
of binaries per set of simulations and the Kuiper test results 
in %. A Kuiper test result is the probability that 2 distribu- 
tions are drawn from the same parent distribution. Some of 
the results seem to be inconsistent with being drawn from 
the same parent distribution. However, we have performed 
a large number of comparisons with the Kuiper test. This 
unavoidably leads to the problem of multiple testing, which 
can basically be understood such that if we perform 100 in- 



dependent tests, a fraction of the order of 10% of p-values 
below 0.1 will arise by chance even if none of the null hy- 
pothesis of the tests would be wrong. Moreover, the small 
p-values occur for tests with sample sizes ~ 50, which is just 
at the lower limit for reliable results with the Kuiper test. 
Hence, in view of the small number of "significantly small" 
p-values (these are marked coloured in Table [2} , we con- 
clude that we do not find significant evidence for deviations 
of interesting size of the properties of dynamically created 
binaries from NBODY4 and starlab simulations. 

We also tested the eccentricity distributions against the 
common assumption of a thermal distribution, which is an 
eccentricity distribution ~ 2*e, or a cumulative distribution 
~ e 2 . A thermal distribution is g enerally expe cted based on 
phase space arguments (see e.g. lHeggielll975l ). The results 
are given in the last two columns of Table [2] for starlab 
and NBODY4 respectively, and show good agreement with a 
thermal distribution. However, cumulative distributions of 
higher polynomial order than e 2 are not rejected either by 
the Kuiper test. We therefore used the binary data obtained 
as by-product from the calculations of the bootstrap test 
clusters, using starlab only. The number of runs is a fac- 
tor 6 higher than for our main simulations, hence statistics 
also for the binaries is greatly enhanced (total number of 
binaries is 2018, compared to 323 for the main simulations). 
We test their cumulative eccentricity distributions against 
a number of power-law distributions e". For the STD test 
clusters we find a range in 01 — 2.1 - 2.8 with Kuiper test 
probabilities > 10%, with a probability > 95% for a = 2.3 
- 2.5, hence significantly biased towards larger eccentricities 
than a thermal distribution would predict (a thermal dis- 
tribution has a Kuiper test probability = 4.16%, hence is 
significantly rejected). 

We split the whole sample in thirds, based on the semi- 
major axis, the distance from the cluster centre and the 
binding energy. However, due to the reduction of the number 
of binaries in each of these subsets, the Kuiper test does not 
reject the null hypothesis of the eccentricity distributions 
being thermal on a significant level (except for the subset 
of binaries with intermediate semi-major axes, which has a 
p- value of 2.5%). The p- value curves are too broad to derive 
any trends. 

The general agreement between the data obtained using 
either starlab or nbody4 is good. 

The distributions just after core collapse give compara- 
ble results (except for the spatial distribution, as the binaries 
did not yet have enough time to escape the cluster centre 
significantly), although the number of binaries is smaller (i.e. 
statistics is poorer). 

5.2 Results for the "MF10 runs" 

The results for the "MF10 runs" are presented in Fig. [3]- [4] 
For these simulations core collapse occurs at around GO- 
TO N-body time units (see Fig. [3] [upper left panel] and Ta- 
ble [3]). Qualitatively both the structural behaviour and the 
properties of the binaries compare well with the standard 
case, except for the speed up the mass function causes. Only 
the binary binding energies are higher as compared to the 
standard runs (by a factor ~ 2), although the form of the 
cumulative distribution of the binding energies is compara- 
ble. 
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Both the bootstrap test for structural/energy parame- 
ters (except for the energy conservation, see Sect. [6} and the 
Kuiper test for the binary properties prove the very good 
agreement between the results obtained from the NBODY4 
and STARLAB simulations. 

We tested again the hypothesis of a thermal eccentricity 
distribution of the dynamically created binaries. For both 
the NBODY4 and the STARLAB main runs, the Kuiper test 
yields probabilities which do not reject the hypothesis of 
a thermal eccentricity distribution. Again, we used the bi- 
nary data obtained as by-product from the calculations of 
the bootstrap test clusters, using STARLAB only. The total 
number of binaries is 1207, compared to 191 for the main 
simulations. We test their cumulative eccentricity distribu- 
tions against a number of power-law distributions e a . For 
the MF10 test clusters we find a range in a = 2.4 - 3.0 with 
Kuiper test probabilities > 10%, with a maximum probabil- 
ity = 80.7% for a = 2.8, hence significantly biased towards 
larger eccentricities than a thermal distribution would pre- 
dict (a thermal distribution has a Kuiper test probability = 
0.06%, hence is highly significantly rejected). 

We again split the whole sample in thirds, now also 
based on the primary mass and the mass ratio. We find that 
binaries with small semi-major axis, high binding energy 
or massive primaries tend to favour distributions closer to 
a thermal distribution than binaries with large semi-major 
axis, low binding energy or low-mass primaries. This can 
qualitatively be understood by assuming those binaries to 
be the ones with the most past encounters, hence the higher 
probability to thermalise. While the binaries with large semi- 
major axis, low binding energy or low-mass primaries are 
still highly significantly rejected, binaries with high binding 
energy are highly significantly rejected, binaries with high 
primary mass are weakly significantly rejected and binaries 
with small semi-major axis are not rejected to be consistent 
with a thermal distribution. The distance from the cluster 
centre (both binaries close to the cluster centre and in the 
far cluster outskirts are significantly rejected) and the mass 
ratio of the binary (~10 per cent, i.e. very weakly signifi- 
cantly rejected) have only small effects. 

5.3 Results for the "MF100 runs" 

The results for the "MF100 runs" are presented in Fig. [5]- 

El 

For these simulations core collapse occurs at around 20 
N-body time units (see Fig. [5] [upper left panel] and Table 
O, the wider mass function (compared to "MF10" runs) fur- 
ther speeding up the evolution. Qualitatively both the struc- 
tural/energy parameters and the properties of the binaries 
compare well with the standard case. Only the binary bind- 
ing energies are again higher as compared to the "MF10" 
runs, and the cumulative distribution of the binding energies 
is steeper, biased to higher energies. In general, the MF100 
runs show larger scatter compared to the other runs. This 
is due to the larger stochastic effects for the highest masses 
caused by the wider mass range. 

Both the binary properties and the structural/energy 
parameters are in good agreement. Weakly significant incon- 
sistency is found for the half-mass radius and the potential 
energy evolution only. For these runs, even the energy con- 
servation is not inconsistent, as the main differences occur 



only before core collapse, at ages which are largely removed 
by skipping the first 10 N-body time units for the bootstrap- 
ping. 

For the MF100 test clusters we also test the eccentric- 
ity distribution of unperturbed binaries (again for the test 
statistics clusters, hence 485 clusters instead of 62 clusters 
in the main simulations) against various power-law relations 
and find a range in a = 1.9 - 3.0 with Kuiper test probabili- 
ties > 10%, with a plateau of probability > 95% for a = 2.3 
- 2.6. A thermal distribution has a Kuiper test probability 
= 28.1%, hence can not be rejected. 

We again split the complete sample in thirds and em- 
ploy Kuiper tests to quantify the probability of the sub- 
samples' eccentricity distribution being thermal. Binaries 
are consistent with a thermal eccentricity distribution for: 
small semi-major axes, high binding energies, high primary 
mass, large distances from the cluster centre and (to a lesser 
extend) large mass ratios. For each of those subsets when 
compared with a thermal distribution a Kuiper test gives p- 
values > 80 per cent. Except for the small mass ratio subset 
(which is comparable with a thermal distribution at p- value 
ss 40 per cent) for the opposite subsets a thermal distri- 
bution is at least weakly significantly, if not more strongly, 
rejected. 

5.4 Using STARLAB "MF10 runs" to test for 
possible biases introduced by analysis 
procedure 

We checked whether the binary reconstruction with STAR- 
lab/kira introduced spurious effects. 

The structural parameters are largely unaffected. Minor 
differences in the binary parameters originate in the inclu- 
sion of perturbed binaries into the sample before full binary 
reconstruction. 

The main differences occur for the energies. Before bi- 
nary reconstruction, binaries are treated as 2 separate stars. 
Their orbital velocities contribute therefore to the total ki- 
netic energy of the cluster. Their binding potential energy 
constitutes a significant part of the clusters total potential 
energy. In the case of a fully reconstructed binary tree struc- 
ture both the orbital velocities' kinetic energies and the 
binding potential energies are treated separately from the 
total cluster values, leading to an apparent "loss" of total 
energy. 

However, the temporal parameter evolutions derived 
from STARLAB vs NBODY4 simulations show very similar 
bootstrap results both before and after the binary recon- 
struction. We therefore conclude that the binary reconstruc- 
tion does not lead to systematical changes. 

In addition, we checked whether the splitting into and 
analysis of single snapshots introduces systematic differ- 
ences. We pass the full 1000-snapshots STARLAB output 
through the STARLAB analysis routine HSYS_STATS (like we 
do for the single snapshots) and statistically compare the re- 
sults with the results from the single-snapshot approach. We 
find slight differences induced by the resetting of the centre- 
of-mass for the single-snapshot approach, of which none is 
statistically significant (except for the centre-of-mass itself) . 
Likewise, the binaries (both perturbed and unperturbed) do 
not show significant differences. Alone the multiples' prop- 
erties show significant differences (due to the spurious re- 
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Figure 3. Comparison of "MF10 runs" simulations using STARLAB (green/grey) vs NBODY 4 (black). The lines show the median values, 
the error bars give the uncertainty ranges from the 50 individual runs. Shown are the time evolutions of the core radius (top left, bottom 
lines), half-mass radius (top left, top lines), the mean object mass in the core (top right), potential energy (lower left panel) and kinetic 
energy (bottom right). 
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Figure 4. Comparison of binary parameters from "MF10 runs" after 1000 N-body time units (=well after core collapse) using STARLAB 
(green/grey) vs NBODY 4 (black). Shown are the cumulative distributions of the semi-major axis (top left), the eccentricity (top right), 
the secondary-to-primary mass ratio (bottom left) and the binding energy (bottom right). The lines show the data, the error bars give 
the uncertainty ranges from bootstrapping. 
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Figure 5. Comparison of "MF100 runs" simulations using STARLAB (green/grey) vs NBODY 4 (black). The lines show the median values, 
the error bars give the uncertainty ranges from the 50 individual runs. Shown are the time evolutions of the core radius (top left, bottom 
lines), half-mass radius (top left, top lines), the mean object mass in the core (top right), potential energy (lower left panel) and kinetic 
energy (bottom right). 
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Figure 6. Comparison of binary parameters from "MF100 runs" after 1000 N-body time units (=well after core collapse) using STARLAB 
(green/grey) vs NBODY 4 (black). Shown are the cumulative distributions of the semi-major axis (top left), the eccentricity (top right), 
the secondary-to-primary mass ratio (bottom left) and the binding energy (bottom right). The lines show the data, the error bars give 
the uncertainty ranges from bootstrapping. 
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construction of a small number of multiples far out of the 
cluster centre), and should be treated with caution. 



6 ENERGY CONSERVATION 

Ideally, the total energy should be constant during each sim- 
ulation (except for the locking up of energy in binaries), 
and hence also in the median datasets. In practice, numer- 
ical inaccuracies lead to changes in the total energy. This 
energy error can be seen e.g. as the change of total energy 
per N-body time unit. These errors are larger if higher ac- 
curacy and hence more timesteps per N-body time unit are 
required, e.g. during core collapse and close encounters. 

After core collapse, the energy error rises steeply, in- 
dicating that after core collapse, the errors are dominated 
by close encounters and binaries (and more timesteps per 
N-body unit are required). Prior to core collapse the errors 
originate from inaccuracies of the Hermite integrator alone. 
For ages well after core collapse, the energy error steadily de- 
creases as the cluster expands and the number of timesteps 
per N-body unit drops. 

The energy conservation from NBODY4 shows roughly 
the same shape as the one from STARLAB (see Fig. 0. 
However, before core collapse the energy conservation from 
NBODY4 is systematically worse than from STARLAB, roughly 
a factor 3 - 10, with median deviations of 4 - 5 (~2 - 3 
for MF100). After core collapse, the median deviations are 
for all sets of simulations well below a factor 2. However, 
the overall scatter strongly increases (by factors 30 - 2000) 
due to the non-conservative numerical effects during binary 
interactions. Interestingly, the STARLAB data show similar 
energy conservation compared to the NBODY4 data, despite 
the much more sophisticated and programming expensive 
regularisation treatment in NBODY4. 

Performing an integration with the KS routines 
switched off shows that the larger energy errors of NBODY4 
prior to core collapse come from the KS formalism (possibly 
from the interface between the main integrator and the KS 
algorithm), although no details or solutions could be found. 
However, the energy conservation in NBODY4 is still good, 
and clearly sufficient for most applications. 
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Figure 8. Comparison of simulations using STARLAB (green/grcy) 
vs NBODY 4 (black). Shown is the time at which core collapse oc- 
curs (i.e. the time at which the first binary with more than 100 
kT binding energy is formed) vs the "depth" of core collapse (i.e. 
the core radius at core collapse time). 



Table 3. Values of core collapse, i.e. at the time when the first 
binary with binding energy > 100 kT occurs. Given are the time 
of core collapse t cc , and the core radius at this time r c ,cc Shown 
are the median value from the individual runs, the uncertainty 
ranges are the 16 and 84 quantiles, equivalent to lcr ranges of the 
mean. 
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ation due to binaries becomes more efficient for higher mass 
binaries. 



7 CORE COLLAPSE 

For the three settings investigated in this study (i.e. "stan- 
dard", "MF10" and "MF100"), the parameters of core col- 
lapse are presented in Fig. [8] and Table [3] For each setting, 
the values as derived from STARLAB are consistent within the 
la uncertainties with the values as derived from NBODY4. 

We find a power-law dependence between time and 
"depth" of core collapse, with a ~ -0.3 ± 0.2. This behaviour 
can be understood qualitatively as follows: Core collapse is 
driven by the most massive stars, which are sinking towards 
the centre of the cluster on a timescale that is a fraction 
1/M of the relaxation time. Hence core collapse will occur 
faster in clusters with a broader mass spectrum. For clus- 
ters with more massive stars, the core also reaches a state 
where finite-N effects become important. At the same time 
core collapse is halted at lower densities since energy gener- 



8 ESCAPING STARS 

As a final check we compare the specific potential and ki- 
netic energies of stars escaping the cluster, as especially their 
kinetic energies might sensitively depend on details in the 
treatment of binaries and close encounters in general. 

To this end we determine for each star in each simula- 
tion its potential and kinetic energy from its position and ve- 
locity within the cluster, and divide by the mass of the star. 
We choose to compare only data at an age of 1000 N-body 
time units, hence sufficiently after core collapse. Results are 
presented in Fig. [9] 

The left panels depict the distribution of kinetic versus 
potential energy from our STARLAB runs for the different 
mass function settings. A clear bifurcation is visible: the up- 
per branch/edge consists of stars which are barely unbound 
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Figure 7. The energy conservation for NBODY 4 (black lines) and STARLAB (green lines) for: "standard runs" (top), "MF10 runs" (middle), 
and "MF100 runs" (bottom). The right panels show the respective energy error ratios NBODY4/STARLAB. A horizontal lines as y=l is 
added, to guide the eye. 



Table 1. Results from the bootstrap test. Given are the fractions (in %) of the test distributions more deviating than the main simulations, 
i.e. the smaller this number the less alike the distributions are. For details see text. 
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Table 2. Kuiper test results. Given is the probability (in %) that for the given setup the binary properties from STARLAB vs. NBODY4 
are drawn from the same distribution. Quantities are: axis = semi-major axis, ecc = eccentricity, Dl = distance of the binary from 
cluster centre, in units of the core radius, , D2 = distance of the binary from cluster centre, in units of the half-mass radius, E/kT = 
binding energy in E/kT, m2/ml = mass ratio secondary mass / primary mass. #1 = number of binaries in the STARLAB simulation, #2 
= number of binaries in the NB0DY4 simulation. In addition, e 2 ^ and e 2 NB are the probabilities, that respectively the STARLAb/nbody4 
data for the eccentricity distributions are compatible with a cumulative distribution of the form e 2 (equivalent to a thermal eccentricity 
distribution ~ 2*e). For a description of the setups see text. 
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(and a fraction of those might become recaptured by the 
cluster) . With time these barely unbound stars migrate out- 
wards in the cluster, into cluster regions with low specific 
potential energy (i.e. downwards in the left panels of Fig. 
O. Stars with higher (specific) kinetic energy migrate faster, 
hence in the same time reach farther distances (i.e. regions 
characterised by lower specific potential energy). Core col- 
lapse is the earliest and by far strongest event leading to 
the unbinding of stars. Hence, the lower branches consist of 
stars which became unbound during core collapse, as they 
had the longest time to travel farthest. Stars in between the 
two main branches became unbound after core collapse. 

There are two main mechanisms to accelerate stars 
sufficiently to become unbound: multiple weak encoun- 
ters ("evaporating stars") and single/few strong encounters 
("ejected stars", usually star-binary or binary-binary inter- 
actions). In the case of clusters with a stellar mass function, 
"ejected" stars can originate either i) from a scattering event 
of a single star on a binary (here the energy gain for the 
scattered star is small, as the energy gain originates from the 
shrinking binary orbit alone) or ii) from an exchange interac- 
tion (in this case the energy gain for the ejected star is large, 
as the energy gain originates from the orbit shrinking and 
the change in potential energy, as primarily a low-mass bi- 
nary component is exchanged by a high-mass intruder star) . 

For an in-depth study o f "evaporating" vs. "ejected" 
stars see iKupper et all (|2008l ). On average, "ejected" stars 
have higher kinetic energies than "evaporating" stars. Es- 
pecially the stars with the highest kinetic energies are ex- 
clusively "ejected" stars. The maximum velocity gain a star 
can get during an interaction with a binary is of the order 
of the maximum orbital velocity of the binary stars. For an 
equal-mass binary the orbital velocity is directly related to 
the hardness of the binary, while for unequal-mass binaries 
the mass ratio between the binary stars plays the dominant 
role. The hardening of a binary is a long process. Hence, for 
equal-mass systems highly energetic ejected stars can only 
appear well after core collapse (and the formation of the 
first binaries) , while for systems with a stellar mass function 



they can appear already right at core collapse. This effect 
is seen in Fig. upper left panel: for the standard models, 
the lower branch shows a break at a specific kinetic energy 
~ -0.3, where stars with higher kinetic energy are slightly 
offset towards higher potential energies (i.e. closer to the 
cluster centre). This can be understood if these high-energy 
"ejected" stars left the cluster after the "evaporating" stars 
with lower kinetic energies (and therefore had less travel 
time), due to the time required for a sufficient hardening of 
the binaries. 

The distinction into "evaporating" and "ejected" stars 
can also be seen in Fig. |§J upper right panel, which show 
a pronounced dip between these constituents. For systems 
with a stellar mass function, the contribution from "scat- 
tered" stars increases for wider mass functions, as the prob- 
ability of a low-mass star being scattered at a high-mass bi- 
nary increases. This increases the contribution of stars with 
intermediate kinetic energies, filling up the dip seen in Fig. 
[9] upper right panel. In addition, the "exchanged" stars can 
get up to higher kinetic energies for wider mass function, 
as the possible energy gain due to the change in potential 
energy increases. This is seen at the high-energy end of the 
distributions in Fig. [9] right panels. 

We employ again a Kuiper test to test for statistically 
significant differences between the NBODY4 and STARLAB en- 
ergy distributions, both for the potential and the kinetic 
energy. For the standard and the MF10 runs we find no 
statistically significant deviations. For the MF100 runs, we 
find probabilities of 3.1% (kinetic energy) and 0.24% (poten- 
tial energy), hence significant/highly significant deviations. 
Visual inspection shows that the energy distributions are 
slightly narrower for the STARLAB runs compared to the 
NBODY4 results. With ~ 10,000 stars in each sample, the 
Kuiper test gives a statistically significant difference. From 
a more detailed analysis of the binned distributions shown 
in Fig. |9j we find deviations between the two distributions of 
order < 2a for a determined from the Poisson distributions 
which are expected to describe the number of stars in each 
bin. 
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Figure 9. Specific kinetic and potential energy of escaping stars after 1000 N-body time units for standard setting (upper panels), MF10 
runs (middle panels) and MF100 (lower panels). Left panels: Shown is the absolute number of stars in a given kinetic-vs. -potential energy 
bin (logarithmic bins) for STARLAB simulations (logarithmic grey scaling). Right panels: Comparison of kinetic energy distributions from 
NBODY4 (black lines) and STARLAB simulations (green lines). 



9 CONCLUSIONS 



We presented the first systematic in-depth study on how 
well results from the two major N-body codes nbody (here 
NBODY4) and STARLAB are comparable. We started with 
three sets of input models (50 initial configurations each, 
for three different stellar mass functions) and evolved these 
input models independently with NBODY4 and STARLAB. We 
analysed the results in a consistent way, and developed sta- 
tistical tools to quantitatively compare the median results 



of a variety of parameters (for each stellar mass functions) 
derived using the two codes. 

Overall, the agreement between the results obtained 
from the nbody 4 runs and from the STARLAB runs is very 
good. Statistically significant deviations were only found for 
the energy conservation before core collapse (where NBODY4 
is significantly worse, likely due to problems at the interface 
between the main integrator and the KS algorithm for close 
encounter treatment) and for the kinetic/potential energy 
distributions of escaping stars in the MF100 runs (with the 
STARLAB distributions being slightly narrower). 
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While testing the binary eccentricity distributions 
against the common assumption of a thermal distribution, 
we find good agreement for the main simulations, both for 
STARLAB and NBODY4. However, extending the number of 
test clusters (for STARLAB only), we find statistically signifi- 
cant biases towards higher eccentricities than a thermal dis- 
tribution would predict. These deviations are driven by the 
dynamically least evolved binaries, while stars with proba- 
bly the highest number of previous encounters tend to be 
more thermalised (though especially for the MF10 setting 
[i.e. a narrow mass range] statistically significant deviations 
remain for a number of subsets). 

We tested our approach for biases, potentially induced 
by splitting the simulations into single snapshots and by the 
binary tree reconstruction. None of these effects result in 
statistically significant deviations. 

In summary, we have shown that for purely dynami- 
cal N-body modelling results obtained from STARLAB and 
NBODY4 are consistent with each other, allowing to combine 
these results without introducing systematic effects. 

A similar study including stellar/binary evolution still 
needs to be performed. 
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